Skip to content

Add Qwen3VL MCore Export support from PR 895#1482

Open
hychiang-git wants to merge 25 commits into
mainfrom
hungyueh/pr-895
Open

Add Qwen3VL MCore Export support from PR 895#1482
hychiang-git wants to merge 25 commits into
mainfrom
hungyueh/pr-895

Conversation

@hychiang-git
Copy link
Copy Markdown
Contributor

@hychiang-git hychiang-git commented May 13, 2026

This PR is duplicated from PR #895. Since the original branch source is not available now, we create a new branch where we can update this PR.

What does this PR do?

new feature:

Overview: Add Qwen3-VL (Vision-Language) model support to the Megatron Core export/import
plugin, enabling HuggingFace-to-mcore weight conversion for PTQ/QAT/QAD workflows

Details

Qwen3-VL has a different weight structure from Qwen3 text-only models:

  • Language model weights are under model.language_model. prefix (not model.)
  • Visual encoder weights are under model.visual. prefix
  • The lm_head is at root level, not nested under language_model

This PR adds:

  • mcore_qwen3vl.py: Import/export weight mapping rules between HuggingFace
    Qwen3VLForConditionalGeneration and Megatron Core, handling the language_model prefix for
    all decoder layers, QKV merging/slicing, gated MLP merging/slicing, Q/K layer norms.
  • mcore_common.py: Registers Qwen3VLForConditionalGeneration in
    all_mcore_hf_export_mapping and all_mcore_hf_import_mapping.

Usage

From the comment:

Qwen3VL is supported within Megatron-Bridge, and pretraining and PEFT recipes for Qwen3VL are here and the core code logic here.

Create Megatron-LM/examples/post_training/modelopt/conf/Qwen/Qwen3-VL-8B-Instruct.sh :

#!/bin/bash
# Qwen3-VL-8B-Instruct text-model config for Megatron-LM import/quantize.
#
# Text-model dimensions are identical to Qwen3-8B (4096 hidden, 36 layers,
# 32 heads, GQA=8).  Differences: rope_theta=5000000, checkpoint path uses
# model.language_model.* prefix (handled by mcore_qwen3vl plugin).

if [ -z ${HF_MODEL_CKPT} ]; then
    HF_MODEL_CKPT=Qwen/Qwen3-VL-8B-Instruct
    TOKENIZER_MODEL=Qwen/Qwen3-VL-8B-Instruct
else
    TOKENIZER_MODEL=${HF_MODEL_CKPT}
fi

MODEL_ARGS=" \
    --save-interval 100000 \
    --micro-batch-size 1 \
    --bf16 \
    --no-masked-softmax-fusion \
    --disable-bias-linear \
    --untie-embeddings-and-output-weights \
    --position-embedding-type rope \
    --no-rope-fusion \
    --normalization RMSNorm \
    --swiglu \
    --num-layers 36 \
    --hidden-size 4096 \
    --ffn-hidden-size 12288 \
    --num-attention-heads 32 \
    --group-query-attention \
    --num-query-groups 8 \
    --kv-channels 128 \
    --qk-layernorm \
    --seq-length 4096 \
    --max-position-embeddings 262144 \
    --tokenizer-type HuggingFaceTokenizer \
    --make-vocab-size-divisible-by 1187 \
    --use-mcore-models \
    --rotary-percent 1.0 \
    --rotary-base 5000000 \
    --no-bias-swiglu-fusion \
"

Then, import Qwen3-VL from HuggingFace to MCore:

  • Directly on a machine with GPUs (local):
MLM_MODEL_CFG=Qwen/Qwen3-VL-8B-Instruct \                                                                                                                                                                                                                                                                                                                                      
HF_MODEL_CKPT=Qwen/Qwen3-VL-8B-Instruct \                                                                                                                                                                                                                                                                                                                                      
MLM_MODEL_SAVE=/tmp/qwen3vl_mcore \                                                                                                                                                                                                                                                                                                                                            
TP=1 \                                                                                                                                                                                                                                                                                                                                                                         
bash Megatron-LM/examples/post_training/modelopt/convert.sh Qwen/Qwen3-VL-8B-Instruct                                                                                                                                                                                                                                                                                  
  • To use it for quantize (PTQ via Megatron-LM path):
MLM_MODEL_CFG=Qwen/Qwen3-VL-8B-Instruct \                                                                                                                                                                                                                                                                                                                                      
HF_MODEL_CKPT=Qwen/Qwen3-VL-8B-Instruct \                                                                                                                                                                                                                                                                                                                            
QUANT_CFG=NVFP4_DEFAULT_CFG \                                                                                                                                                                                                                                                                                                                                                  
TP=4 \                                                                                                                                                                                                                                                                                                                                                                         
bash Megatron-LM/examples/post_training/modelopt/quantize.sh Qwen/Qwen3-VL-8B-Instruct 

Testing

  • Verified round-trip import/export with Qwen3-VL-8B-Instruct with the example usage above
  • Unit tests in tests/unit/torch/export/test_mcore_qwen3vl.py
    covering:
    • Registration in global export/import mappings
    • Import mapping: dense keys, model.language_model.
      prefix, lm_head. at root, QKVMerging, GatedMLPMerging, REPLICATE
      for layernorms, TP sharding configs
    • Export mapping: QKVSlicing, GatedMLPSlicing, no
      parallel_config
    • Import/export symmetry: same mcore keys, matching HF
      prefixes
    • Qwen3-VL vs Qwen3 difference: same keys, VL adds
      language_model. prefix, lm_head unchanged

Before your PR is "Ready for review"

  • Is this change backward compatible?: Yes, additive only
  • Did you write any new necessary tests?: Yes, tests/gpu_megatron/torch/export/test_mcore_qwen3vl.py
  • Did you add or update any necessary documentation? Yes, see docs/source/deployment/3_unified_hf.rst
  • Did you update Changelog? Yes, see CHANGELOG.rst

Additional Information

Companion Megatron-LM PR adds Qwen3VLModel, Qwen3VLDataset, and pretrain_qwenvl.py. Please see this PR NVIDIA/Megatron-LM#3444

Summary by CodeRabbit

  • New Features

    • Added Megatron Core export/import mapping support for Qwen3‑VL, including handling of language‑model weight prefixes and support for both dense and MoE variants.
  • Documentation

    • Added Qwen3‑VL (FP8 / NVFP4) to the deployment support matrix for TensorRT‑LLM.
  • Tests

    • Added GPU tests validating registration, import/export mapping symmetry, and expected layer transformation behavior for Qwen3‑VL.

Review Change Stack

@hychiang-git hychiang-git requested a review from a team as a code owner May 13, 2026 19:21
@hychiang-git hychiang-git requested a review from ChenhanYu May 13, 2026 19:21
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds bidirectional Megatron Core ↔ Hugging Face weight mappings for Qwen3-VL, registers them as a plugin, includes tests validating import/export symmetry and prefix rules, and updates changelog and deployment docs for Qwen 3‑VL support.

Changes

Qwen3-VL Megatron Core Integration

Layer / File(s) Summary
Mapping Module Definition
modelopt/torch/export/plugins/mcore_qwen3vl.py
New module introduces qwen3vl_causal_lm_import and qwen3vl_causal_lm_export dictionaries describing HF↔Megatron Core weight conversion, including prefix adjustments (model.language_model. vs root lm_head.), QKV/MLP merged-projection handling, and MoE expert slicing/routing.
Plugin Registration and Wiring
modelopt/torch/export/plugins/mcore_common.py
Imports the new Qwen3-VL mapping functions and registers Qwen3VLForConditionalGeneration in both global export and import mapping dictionaries, wiring the model type to the new conversion handlers.
Test Suite and Validation
tests/gpu_megatron/torch/export/test_mcore_qwen3vl.py
Comprehensive test suite with registration checks, key-presence validation for dense and MoE parameters, prefix behavior verification, transformation type checks for QKV/MLP, export-mapping func_kwargs checks, symmetry validation between import/export, and comparative assertions against Qwen3.
Documentation Updates
CHANGELOG.rst, docs/source/deployment/3_unified_hf.rst
Changelog entry documenting Qwen3-VL Megatron Core support and deployment documentation noting TensorRT-LLM compatibility with FP8 and NVFP4 quantization formats.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding Qwen3VL export/import support to Megatron Core, which aligns with the primary purpose of all file modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed All Python code changes pass security review. No unsafe deserialization, hardcoded trust_remote_code, eval/exec, # nosec bypasses, or new dependencies added. Code is declarative mappings only.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hungyueh/pr-895

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.rst`:
- Line 136: Move the changelog entry "Add Megatron Core export/import mapping
for Qwen3-VL (``Qwen3VLForConditionalGeneration``) vision-language models..."
out of the 0.42 (2026-03-10) section and place it under the current
unreleased/0.45 section header in CHANGELOG.rst, preserving the existing
formatting and inline code markup; ensure you remove the duplicate from 0.42 and
verify the entry appears exactly once under the 0.45 (unreleased) section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7f30ba6d-8de4-4386-b197-7f8189e56d24

📥 Commits

Reviewing files that changed from the base of the PR and between 62401e1 and 2423ae7.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • docs/source/deployment/3_unified_hf.rst
  • modelopt/torch/export/plugins/mcore_common.py
  • modelopt/torch/export/plugins/mcore_qwen3vl.py
  • tests/gpu_megatron/torch/export/test_mcore_qwen3vl.py

Comment thread CHANGELOG.rst Outdated
Comment thread CHANGELOG.rst Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1482/

Built to branch gh-pages at 2026-05-20 16:11 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@hychiang-git
Copy link
Copy Markdown
Contributor Author

/claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude Review Summary

Small, additive PR that clones the Qwen3 mcore mapping with model.model.language_model. substitution and registers Qwen3VLForConditionalGeneration. Mechanically straightforward; the risk is concentrated in a couple of places.

Findings

  • CRITICAL: 0
  • IMPORTANT: 2
  • SUGGESTION: 1

Most impactful

  1. MoE arch not registered (mcore_qwen3vl.py) — the file ships MoE rules (router, local_experts.*) and the comment claims support for "Qwen3-VL MoE variants like 30B-A3B", but only the dense Qwen3VLForConditionalGeneration arch is wired up in mcore_common.py. Either add the MoE arch entry (mirroring Qwen3MoeForCausalLM) or drop the MoE rules to avoid dead code that implies unsupported behavior.

  2. lm_head placement — root-level lm_head. is inherited from mcore_qwen.py, but for several recent *ForConditionalGeneration VLMs in transformers, lm_head lives at model.language_model.lm_head.. If that holds for the Qwen3-VL release you target, import silently misses the tensor and export writes to the wrong key. PR description says round-trip was verified, so this may be fine — but worth a one-time safe_open(...).keys() confirmation.

  3. Test placementtest_mcore_qwen3vl.py is dict-shape inspection only, doesn't need GPU or Megatron, but lives in tests/gpu_megatron/. Belongs in tests/unit/torch/export/ so it runs in the fast pre-merge lane (matches what the PR description said). Also note these tests assert "the dict we wrote equals the dict we wrote" — a small integration test against a real HF state-dict snapshot would catch the lm_head issue above.

Risk: Low-to-moderate. Code is purely additive, no existing arch behavior changes. Worst case is a broken Qwen3-VL round-trip that only manifests at runtime — which is exactly why the test placement matters.

Comment thread modelopt/torch/export/plugins/mcore_qwen3vl.py Outdated
Comment thread tests/gpu_megatron/torch/export/test_mcore_qwen3vl.py Outdated
Comment thread modelopt/torch/export/plugins/mcore_qwen3vl.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.85%. Comparing base (e4dc020) to head (74019e3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1482      +/-   ##
==========================================
+ Coverage   76.79%   76.85%   +0.06%     
==========================================
  Files         474      475       +1     
  Lines       51578    51592      +14     
==========================================
+ Hits        39607    39650      +43     
+ Misses      11971    11942      -29     
Flag Coverage Δ
unit 52.65% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add Megatron Core export/import mapping for Qwen3-VL
(Qwen3VLForConditionalGeneration). Handles the model.language_model.
weight prefix and supports both dense and MoE variants.

Signed-off-by: Hung-Yueh <hungyuehc@nvidia.com>

mv test_mcore_qwen3vl.py to tests/gpu_megatron/torch/export/

Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
hychiang-git and others added 4 commits May 13, 2026 21:39
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
@hychiang-git
Copy link
Copy Markdown
Contributor Author

/claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 14, 2026

Claude Review Summary

Findings: CRITICAL: 0 · IMPORTANT: 0 · SUGGESTION: 2

The change is small and additive: it registers a new HF→MCore mapping for Qwen3VLForConditionalGeneration that mirrors the existing Qwen3 mapping but threads the model.language_model. prefix used by the VL state dict (with lm_head. kept at root, matching the HF wrapper). No existing keys, defaults, or modelopt_state schema are touched, so backward-compatibility risk is essentially nil. The QKV/gated-MLP merging/slicing classes are reused, and dense + MoE keys are covered consistently in import and export.

Notes

  • Test placement (suggestion): tests/gpu_megatron/torch/export/test_mcore_qwen3vl.py only inspects dict structure (no GPU, no Megatron, no checkpoint round-trip). It belongs in tests/unit/torch/export/ — that's also where the PR description says it lives. Moving it gets these structural checks into pre-merge CPU CI.
  • Duplication (suggestion): mcore_qwen3vl.py is nearly a copy of mcore_qwen.py with model.model.language_model.. A small derivation helper would keep them in lockstep when Qwen3 mappings evolve.

Risk

Low. The change is isolated to an additive plugin entry and a new file; the round-trip has been validated by the author against Qwen/Qwen3-VL-8B-Instruct. LGTM with the two suggestions above as follow-ups.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no significant issues found. LGTM (two non-blocking suggestions left inline).

Comment thread tests/gpu_megatron/torch/export/test_mcore_qwen3vl.py Outdated
Comment thread modelopt/torch/export/plugins/mcore_qwen3vl.py Outdated
Replace the hand-written dict literals in mcore_qwen3vl.py with a helper
that derives the VL mapping from qwen3_causal_lm_import/export by inserting
'language_model.' after 'model.' in every prefix.  lm_head. (root-level) is
left unchanged.  Remove TestQwen3VLvsQwen3Difference since it now tests the
implementation against itself.  Note visual encoder (model.visual.*) is
intentionally excluded from the mapping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
IMPORTANT fixes:
- Make Qwen3VLConfig / Qwen3VLForConditionalGeneration lazy imports inside
  get_tiny_qwen3vl() and _test_unified_export_megatron() so older transformers
  builds do not break collection of every test importing transformers_models.py
- Drop unsupported MoE claim from CHANGELOG; only Qwen3VLForConditionalGeneration
  (dense) is registered in the mcore dispatch tables

Suggestions:
- Copy func_kwargs in _with_language_model_prefix to avoid shared-dict
  mutation between Qwen3 and Qwen3VL mapping entries
- Add unquantized qwen3vl export test case alongside the FP8 one
- Add torch.distributed.barrier() before the rank-0 vision-weight merge block
  to keep the code safe if TP > 1 is used later

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
@hychiang-git
Copy link
Copy Markdown
Contributor Author

/claude review

Comment on lines +16 to +28
"""Custom mapping from Qwen3-VL Hugging Face models to Megatron Core models.

Qwen3-VL differs from Qwen3 in one structural way: language-model weights live
under ``model.language_model.`` instead of ``model.``, while ``lm_head.weight``
remains at the root level. The mappings below are derived automatically from
the Qwen3 mappings by inserting ``language_model.`` after ``model.`` for every
prefix that starts with ``model.``.

Note: the visual encoder (``model.visual.*``) is intentionally excluded — this
mapping covers only the language-model decoder used for quantization and export.

Reference: https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct/blob/main/model.safetensors.index.json
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Scope-clarifying note: Qwen3-VL ships in two architectures — Qwen3VLForConditionalGeneration (dense) and Qwen3VLMoeForConditionalGeneration (MoE, e.g. Qwen/Qwen3-VL-30B-A3B-Instruct). This PR only registers the dense variant.

The MoE variant cannot reuse qwen3_causal_lm_export with a prefix rewrite because Qwen3-VL-MoE stores experts in fused form (mlp.experts.gate_up_proj/mlp.experts.down_proj as 3-D tensors) rather than the per-expert layout (mlp.experts.{}.down_proj) that qwen3_causal_lm_* assumes. So this is a real limitation, not just a missing registration.

Consider adding a one-line note to the module docstring (e.g. "Covers the dense Qwen3VL variant only; Qwen3VLMoeForConditionalGeneration uses a fused-expert layout and requires a separate mapping.") so that users hitting KeyError: 'Qwen3VLMoeForConditionalGeneration' in _populate_rule_book know it's intentional and what's missing.

Comment on lines +184 to +186
if model_type == "qwen3vl":
torch.distributed.barrier()
if model_type == "qwen3vl" and rank == 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Asymmetric barrier: only the qwen3vl branch calls torch.distributed.barrier() and only rank 0 does the merge + load, but there's no second barrier after the rank-0 work. With the current dist_workers_size_1 (size=1) parametrization this is harmless, but if this test is ever extended to multi-rank size, non-zero ranks will exit _test_unified_export_megatron while rank 0 is still loading the merged checkpoint, and tmp_path may be torn down by the fixture before rank 0 finishes. A trailing torch.distributed.barrier() at the end of the rank-0 block (outside the rank == 0 guard) would future-proof this.

prefix = m.target_name_or_prefix
if prefix.startswith("model."):
prefix = "model.language_model." + prefix[len("model.") :]
result[key] = type(m)(target_name_or_prefix=prefix, func_kwargs=dict(m.func_kwargs))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] dict(m.func_kwargs) is a shallow copy — the inner parallel_config ParallelConfig instance is still shared with the source Qwen3 mapping. That's fine today because ParallelConfig is treated as immutable by all consumers, but it means any future code that mutates func_kwargs["parallel_config"] on a Qwen3VL entry would silently bleed into the Qwen3 entry as well. Since the cost is zero, copy.deepcopy(m.func_kwargs) would be more defensive.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review — LGTM with minor suggestions.

Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 3 (all inline)

Summary of review:

  • The mapping derivation is correct. Verified against Qwen/Qwen3-VL-8B-Instruct's model.safetensors.index.json: language-model weights live under model.language_model.*, vision encoder under model.visual.*, and lm_head.weight is at root — which matches the prefix-rewrite logic exactly.
  • The string-based prefix rewrite is currently safe because every model.-prefixed entry in qwen3_causal_lm_{import,export} is a language-model component and lm_head. (root) is left untouched.
  • unified_export_megatron.py already handles multimodal configs by reading from text_config (line 146), so the new arch slots in cleanly without further plumbing.
  • Test changes correctly thread text_config through the export path and merge in the vision tower for the round-trip from_pretrained sanity check.

Inline suggestions (non-blocking):

  1. Qwen3VLMoeForConditionalGeneration is unsupported — the MoE variant uses a fused expert layout (mlp.experts.gate_up_proj / mlp.experts.down_proj as 3-D tensors) and can't reuse the dense Qwen3 mapping. A one-line scope note in the module docstring would save users a confusing KeyError.
  2. Asymmetric barrier in the qwen3vl test branch — currently fine under dist_workers_size_1 but would race if the test is ever extended to multi-rank.
  3. dict(m.func_kwargs) is a shallow copy; copy.deepcopy would be more defensive against future mutation.

Risk assessment: Low. Additive change, dense Qwen3-VL only, no impact on existing model paths.

- Add docstring note that Qwen3VLMoeForConditionalGeneration is unsupported
  (MoE variant uses 3-D fused expert tensors incompatible with dense mapping)
- Replace dict(m.func_kwargs) with copy.deepcopy for full isolation of copied
  mapping entries
- Fix asymmetric barrier: fold rank==0 check under the barrier so all ranks
  participate before any rank writes vision weights

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
@hychiang-git
Copy link
Copy Markdown
Contributor Author

/claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 18, 2026

Claude review summary

Findings: CRITICAL 0, IMPORTANT 0, SUGGESTION 2

Highlights:

  • Algorithm is correct. The helper rewrites every model.-prefixed target to model.language_model., leaving lm_head. at root, which matches the Qwen3-VL HF safetensors layout. Registration in all_mcore_hf_export_mapping and all_mcore_hf_import_mapping flows through _populate_rule_book correctly. The exporter pre-existing text_config handling (getattr(self._hf_config, 'text_config', self._hf_config)) already serializes nested Qwen3VLConfig.text_config updates.
  • Change is purely additive: no public API changes, no modelopt_state schema impact, no risk to existing checkpoints.
  • The MoE variant Qwen3VLMoeForConditionalGeneration is correctly excluded with an in-file comment, since fused 3-D expert tensors need a dedicated mapping. The auto-generated CodeRabbit release notes that claim MoE support are inaccurate, but that is a release-note blurb issue, not code.

Inline suggestions:

  1. _merge_vision_weights in the test writes model-vision.safetensors but does not update model.safetensors.index.json. HF from_pretrained loads via the index, so vision encoder weights silently fall back to random init. The forward pass uses input_ids only, so the test still passes but does not actually verify the vision portion of the round-trip.
  2. _with_language_model_prefix reconstructs each mapping via type(m)(target_name_or_prefix=..., func_kwargs=...), which couples to the current CustomModuleMapping subclass init signatures. copy.deepcopy(m) plus in-place prefix rewrite is more robust to future subclass changes.

Risk: low. Additive plugin, well-contained, with unit and manual integration test coverage. Both findings are non-blocking suggestions.

Comment on lines +78 to +92
def _merge_vision_weights(src_dir: Path, dst_dir: Path) -> None:
"""Copy model.visual.* tensors from src safetensors into dst_dir.

The mcore export only writes language-model weights. To produce a complete
Qwen3-VL checkpoint the vision-encoder weights must be merged from the
original pretrained checkpoint.
"""
vision_tensors = {}
for sf in sorted(src_dir.glob("*.safetensors")):
with safe_open(str(sf), framework="pt", device="cpu") as f:
for key in f.keys(): # noqa: SIM118
if key.startswith("model.visual."):
vision_tensors[key] = f.get_tensor(key)
if vision_tensors:
save_file(vision_tensors, str(dst_dir / "model-vision.safetensors"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] _merge_vision_weights writes model-vision.safetensors but does not update model.safetensors.index.json. The exporter writes the index in save_safetensors_by_layer_index / save_safetensors, and Qwen3VLForConditionalGeneration.from_pretrained uses that index when one is present — any safetensors file not referenced by the index is ignored.

Net effect on this test:

  • The any(k.startswith("model.visual.") for k in keys) assertion only checks that the file exists on disk, not that it is discoverable by from_pretrained.
  • Because the forward pass below uses input_ids only (no pixel_values), the vision encoder is never exercised, so randomly-initialised vision weights would not cause a failure.

So the test claims to validate a complete Qwen3-VL checkpoint round-trip, but in practice it only validates the language-model portion. Consider either (a) rewriting the index file after merging vision weights and verifying vl_model.model.visual.patch_embed.proj.weight (or similar) round-trips bit-exact, or (b) tightening the docstring/assertions to make clear this is a language-model-only round-trip and the vision merge is a placeholder for the manual workflow.

Comment on lines +42 to +59
mapping: dict[str, CustomModuleMapping],
) -> dict[str, CustomModuleMapping]:
"""Derive a VL mapping from a base Qwen3 mapping.

Rewrites every ``target_name_or_prefix`` that starts with ``model.`` to
``model.language_model.<rest>``. Prefixes that do not start with
``model.`` (e.g. ``lm_head.``) are left unchanged.
"""
result = {}
for key, m in mapping.items():
prefix = m.target_name_or_prefix
if prefix.startswith("model."):
prefix = "model.language_model." + prefix[len("model.") :]
result[key] = type(m)(
target_name_or_prefix=prefix, func_kwargs=copy.deepcopy(m.func_kwargs)
)
return result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Reconstructing each mapping via type(m)(target_name_or_prefix=..., func_kwargs=...) works today only because every CustomModuleMapping subclass in mcore_custom.py happens to share the same (target_name_or_prefix, func_kwargs) init signature. If a future subclass adds another required constructor arg (or changes the keyword names), this loop will silently break or drop information.

A more robust pattern is to deep-copy the original mapping and just rewrite the prefix:

def _with_language_model_prefix(
    mapping: dict[str, CustomModuleMapping],
) -> dict[str, CustomModuleMapping]:
    result = {}
    for key, m in mapping.items():
        new_m = copy.deepcopy(m)
        if new_m.target_name_or_prefix.startswith("model."):
            new_m.target_name_or_prefix = (
                "model.language_model." + new_m.target_name_or_prefix[len("model.") :]
            )
        result[key] = new_m
    return result

This preserves func_name and any future fields automatically and removes the implicit coupling to subclass __init__ signatures.

@hychiang-git
Copy link
Copy Markdown
Contributor Author

/ok to test e8e2d7b

@hychiang-git
Copy link
Copy Markdown
Contributor Author

/ok to test 1a86b05

Comment thread tests/_test_utils/torch/transformers_models.py
Comment thread tests/gpu_megatron/torch/export/test_unified_export_megatron.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does running all these tests take? It should be printed at the end of pytest command so we can see if these tests are too slow or fine

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is printed at the end

============================= test session starts ==============================                                                                                                                                                                                                                                                                           
platform linux -- Python 3.12.3, pytest-9.0.3, pluggy-1.6.0 -- /opt/venv/bin/python                                                                                                                                                                                                                                                                        
cachedir: .pytest_cache                                                                                                                                                                                                                                                                                                                                    
hypothesis profile 'default' -> database=InMemoryExampleDatabase({})                                                                                                                                                                                                                                                                                       
rootdir: /workspace/modules/Model-Optimizer                                                                                                                                                                                                                                                                                                                
configfile: pyproject.toml                                                                                                                                                                                                                                                                                                                                 
plugins: hydra-core-1.3.2, anyio-4.9.0, timeout-2.4.0, mock-3.15.1, langsmith-0.7.30, typeguard-4.5.0, rerunfailures-16.1, xdist-3.8.0, shard-0.1.2, flakefinder-1.1.0, hypothesis-6.130.8                                                                                                                                                                 
collecting ... collected 18 items                                                                                                                                                                                                                                                                                                                          
Running 18 items in this shard: tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-None-None-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-None-NVFP4_DEFAULT_CFG-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.p
y::test_unified_export_megatron[nemotron-None-NVFP4_DEFAULT_CFG-FP8_KV_CFG], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-eagle-None-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-medusa-None-None], tests/gpu_megatron/torch/ex
port/test_unified_export_megatron.py::test_unified_export_megatron[llama-None-None-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-None-FP8_DEFAULT_CFG-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-None-FP8_DEFAULT_CFG-FP8_KV_C
FG], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-eagle-None-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-medusa-None-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[qwen3vl-Non
e-None-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[qwen3vl-None-FP8_DEFAULT_CFG-None], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_import_megatron[llama], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_import_megatron[qwen3vl
], tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_qkv_slicing_gqa_tp2, tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_mtp_state_dict_single_safetensors, tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_mtp_state_dict_no_mtp_keys, tests/gpu_megatron/torch/export/test_unified_expor
t_megatron.py::test_mtp_state_dict_index_file                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                           
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-None-None-None] PASSED [  5%]                                                                                                                                                                                                                       
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-None-NVFP4_DEFAULT_CFG-None] PASSED [ 11%]                                                                                                                                                                                                          
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-None-NVFP4_DEFAULT_CFG-FP8_KV_CFG] PASSED [ 16%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-eagle-None-None] PASSED [ 22%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[nemotron-medusa-None-None] PASSED [ 27%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-None-None-None] PASSED [ 33%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-None-FP8_DEFAULT_CFG-None] PASSED [ 38%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-None-FP8_DEFAULT_CFG-FP8_KV_CFG] PASSED [ 44%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-eagle-None-None] PASSED [ 50%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-medusa-None-None] PASSED [ 55%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[qwen3vl-None-None-None] PASSED [ 61%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[qwen3vl-None-FP8_DEFAULT_CFG-None] PASSED [ 66%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_import_megatron[llama] PASSED [ 72%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_import_megatron[qwen3vl] PASSED [ 77%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_qkv_slicing_gqa_tp2 PASSED [ 83%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_mtp_state_dict_single_safetensors PASSED [ 88%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_mtp_state_dict_no_mtp_keys PASSED [ 94%]
tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_mtp_state_dict_index_file PASSED [100%]

=============================== warnings summary ===============================
...
...
================== 18 passed, 30 warnings in 72.11s (0:01:12) ==================

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you look at following section in the pytest output

============================= slowest 50 durations =============================
...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, 72.11s for all tests seems reasonable

hychiang-git and others added 3 commits May 19, 2026 16:09
Per reviewer feedback: the imports are now protected by the module-level
pytest.importorskip("transformers") guard (transformers_models.py) and by the
gpu_megatron test environment assumption (test file), so top-level imports are
the right style rather than lazy per-call imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
hychiang-git and others added 5 commits May 19, 2026 13:24
Signed-off-by: hychiang <hungyuehc@nvidia.com>
Top-level import of transformers.models.qwen3_vl fails on older transformers
builds that don't have the submodule, breaking collection of every test that
imports transformers_models.py. Use try/except so the module loads cleanly and
get_tiny_qwen3vl() calls pytest.skip() when the classes are unavailable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
This reverts commit 10038f0.

Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
This reverts commit 63a229a.

Signed-off-by: Hung-Yueh Chiang <hungyuehc@nvidia.com>
@hychiang-git
Copy link
Copy Markdown
Contributor Author

\ok to test 74019e3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants